Japanese-Spanish Thesaurus Construction Using English as a Pivot

نویسندگان

  • Jessica C. Ramírez
  • Masayuki Asahara
  • Yuji Matsumoto
چکیده

We present the results of research with the goal of automatically creating a multilingual thesaurus based on the freely available resources of Wikipedia and WordNet. Our goal is to increase resources for natural language processing tasks such as machine translation targeting the Japanese-Spanish language pair. Given the scarcity of resources, we use existing English resources as a pivot for creating a trilingual JapaneseSpanish-English thesaurus. Our approach consists of extracting the translation tuples from Wikipedia, disambiguating them by mapping them to WordNet word senses. We present results comparing two methods of disambiguation, the first using VSM on Wikipedia article texts and WordNet definitions, and the second using categorical information extracted from Wikipedia, We find that mixing the two methods produces favorable results. Using the proposed method, we have constructed a multilingual Spanish-Japanese-English thesaurus consisting of 25,375 entries. The same method can be applied to any pair of languages that are linked to English in Wikipedia.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-lingual Sentence Generation from the PIVOT Interlingua

This paper proposes a strategy for French and Spanish sentence generation systems, based on the English generation system. The English generation mode! consists of four procedures, conceptual wording (sentence-structure planning), syntactic selection, ordering and morphological generation. The analysis of linguistic similarities and differences between English, French and Spanish reveals that a...

متن کامل

Automatic Construction Of A Transfer Dictionary Considering Directionality

In this paper, we show how to construct a transfer dictionary automatically. Dictionary construction, one of the most difficult tasks in developing a machine translation system, is expensive. To avoid this problem, we investigate how we build a dictionary using existing linguistic resources. Our algorithm can be applied to any language pairs, but for the present we focus on building a Korean-to...

متن کامل

Working with Russian Queries for the GIRT, Bilingual and Multilingual CLEF Tasks

For our activities within the CLEF 2001 evaluation, Berkeley group one participated in the bilingual, multilingual and GIRT tasks focussing on the use of Russian queries. Performance on the Russian queries !English documents bilingual task was excellent, comparable to performance using German queries. For the multilingual task we utilized English as a pivot language between Russian and German a...

متن کامل

Cultural Influence on the Expression of Cathartic Conceptualization in English and Spanish: A Corpus-Based Analysis

This paper investigates the conceptualization of emotional release from a cognitive linguistics perspective (Cognitive Metaphor Theory). The metaphor weeping is a means of liberating contained emotions is grounded in universal embodied cognition and is reflected in linguistic expressions in English and Spanish. Lexicalization patterns which encapsulate this conceptualization i...

متن کامل

The PENMAN Project on Knowledge-Based Machine Translation

of an integrated knowledge-based machine-aided translation system called PANQLOSS. The ISI-specific work includes the development of English sentence generation and sentence planning capabilities and the construction of an Ontology of concepts to act as the semantic lexicon for all modules of the system as a whole. In addition, we continue to enhance Penman's existing generation technology, to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008